Shader Core
The 160 VLIW five-way superscalar shader processors are split down into ten SIMD cores, each with 16 shader processors (or 80 stream processors) per SIMD. Each of these SIMD units has its own thread sequencers and arbiters associated with it inside the Ultra Threaded Processor, just like in the R600 architecture – not a great deal has changed at the top of the chip.
Inside each superscalar shader processor, there’s a 16KB local data store, which allows each shader processing unit inside the SIMD to communicate with one another without having to use up valuable texture cache space. Communication between SIMD cores is improved too, thanks to the inclusion of the global data store – the SIMD cores can access this via the high-speed data request bus.
And while we’re speaking of texturing, it is worth mentioning that there is a dedicated texture unit—complete with a common L1 texture cache—aligned to each SIMD core, meaning there are a total of ten texture units chip-wide.
An RV770 shader processor
Of the five instruction units in each superscalar shader processor, there are four units which can handle a limited number of instructions (FP MAD, FP MUL, FP and INT ADD along with dot product calculations too) per clock cycle.
The fifth unit in each shader processor can’t handle dot products, integer ADD commands or double precision, but can handle more types of instructions than the thinner units – these include things like integer multiply and division, along with bit shifting, and transcendental commands like SIN, COS, LOG, and so on. All of the shader operations are done with 32-bit precision.
The branch execution unit is also present inside RV770 and it is tasked with the same flow control and conditional operations, but again AMD chooses not to count this when tallying up the number of stream processors. Register space also remains unchanged on a shader processor level, too, but obviously because the number of shader processors has increased by two and a half times, total register space has increased by the same factor.
Despite this, the registers have changed slightly since RV670, because AMD has added a register overflow in order to help prevent shader processors from stalling quite so often when they’re waiting for data to return from another part of the GPU.
Two of RV770's SIMD cores
So with all this in mind, AMD hasn’t changed an awful lot inside the execution units – they’re largely the same as they were with R600 and RV670, aside from the new 16KB data stores, register overflow and realignment of the texturing hardware. The changes aren’t immediately obvious from just looking at the architecture though – AMD said that it has increased the SIMD core’s performance per mm² by around 40 percent at this level. In other words, the amount of die space taken up by each of RV770’s execution units has been reduced by 40 percent through some design optimisation and crazy wizardry.
There’s one final question regarding the shader core that I know I’d like to know the answer to – that is, of course, how it stacks up in terms of performance. Let’s find out...
Want to comment? Please log in.